智能论文笔记

Skillful Twelve Hour Precipitation Forecasts using Large Context Neural Networks

Lasse Espeholt , Shreya Agrawal , Casper Sønderby , Manoj Kumar , Jonathan Heek , Carla Bromberg , Cenk Gazen , Jason Hickey , Aaron Bell , Nal Kalchbrenner

分类：机器学习

2021-11-14

由于其对人类生命，运输，粮食生产和能源管理的高度影响，因此在科学上研究了预测天气的问题。目前的运营预测模型基于物理学，并使用超级计算机来模拟大气预测，提前预测数小时和日期。更好的基于物理的预测需要改进模型本身，这可能是一个实质性的科学挑战，以及潜在的分辨率的改进，可以计算令人望而却步。基于神经网络的新出现的天气模型代表天气预报的范式转变：模型学习来自数据的所需变换，而不是依赖于手工编码的物理，并计算效率。然而，对于神经模型，每个额外的辐射时间都会构成大量挑战，因为它需要捕获更大的空间环境并增加预测的不确定性。在这项工作中，我们提出了一个神经网络，能够提前十二小时的大规模降水预测，并且从相同的大气状态开始，该模型能够比最先进的基于物理的模型更高的技能HRRR和HREF目前在美国大陆运营。可解释性分析加强了模型学会模拟先进物理原则的观察。这些结果代表了建立与神经网络有效预测的新范式的实质性步骤。

translated by 谷歌翻译

NFResNet: Multi-scale and U-shaped Networks for Deblurring

Tanish Mittal , Preyansh Agrawal , Esha Pahwa , Aarya Makwana

分类：计算机视觉

2022-12-12

Multi-Scale and U-shaped Networks are widely used in various image restoration problems, including deblurring. Keeping in mind the wide range of applications, we present a comparison of these architectures and their effects on image deblurring. We also introduce a new block called as NFResblock. It consists of a Fast Fourier Transformation layer and a series of modified Non-Linear Activation Free Blocks. Based on these architectures and additions, we introduce NFResnet and NFResnet+, which are modified multi-scale and U-Net architectures, respectively. We also use three different loss functions to train these architectures: Charbonnier Loss, Edge Loss, and Frequency Reconstruction Loss. Extensive experiments on the Deep Video Deblurring dataset, along with ablation studies for each component, have been presented in this paper. The proposed architectures achieve a considerable increase in Peak Signal to Noise (PSNR) ratio and Structural Similarity Index (SSIM) value.

translated by 谷歌翻译

IndicXTREME: A Multi-Task Benchmark For Evaluating Indic Languages

Sumanth Doddapaneni , Rahul Aralikatte , Gowtham Ramesh , Shreya Goyal , Mitesh M. Khapra , Anoop Kunchukuttan , Pratyush Kumar

分类：自然语言处理

2022-12-11

In this work, we introduce IndicXTREME, a benchmark consisting of nine diverse tasks covering 18 languages from the Indic sub-continent belonging to four different families. Across languages and tasks, IndicXTREME contains a total of 103 evaluation sets, of which 51 are new contributions to the literature. To maintain high quality, we only use human annotators to curate or translate\footnote{for IndicXParaphrase, where an automatic translation system is used, a second human verification and correction step is done.} our datasets. To the best of our knowledge, this is the first effort toward creating a standard benchmark for Indic languages that aims to test the zero-shot capabilities of pretrained language models. We also release IndicCorp v2, an updated and much larger version of IndicCorp that contains 20.9 billion tokens in 24 languages. We pretrain IndicBERT v2 on IndicCorp v2 and evaluate it on IndicXTREME to show that it outperforms existing multilingual language models such as XLM-R and MuRIL.

translated by 谷歌翻译

Multimodal Vision Transformers with Forced Attention for Behavior Analysis

Tanay Agrawal , Michal Balazia , Philipp Müller , François Brémond

分类：计算机视觉

2022-12-07

Human behavior understanding requires looking at minute details in the large context of a scene containing multiple input modalities. It is necessary as it allows the design of more human-like machines. While transformer approaches have shown great improvements, they face multiple challenges such as lack of data or background noise. To tackle these, we introduce the Forced Attention (FAt) Transformer which utilize forced attention with a modified backbone for input encoding and a use of additional inputs. In addition to improving the performance on different tasks and inputs, the modification requires less time and memory resources. We provide a model for a generalised feature extraction for tasks concerning social signals and behavior analysis. Our focus is on understanding behavior in videos where people are interacting with each other or talking into the camera which simulates the first person point of view in social interaction. FAt Transformers are applied to two downstream tasks: personality recognition and body language recognition. We achieve state-of-the-art results for Udiva v0.5, First Impressions v2 and MPII Group Interaction datasets. We further provide an extensive ablation study of the proposed architecture.

translated by 谷歌翻译

Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior

Gabriel B Margolis , Pulkit Agrawal

分类：机器人 | 人工智能 | 机器学习

2022-12-06

Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment. This necessitates a slow and iterative cycle of reward and environment redesign to achieve good performance on a new task. As an alternative, we propose learning a single policy that encodes a structured family of locomotion strategies that solve training tasks in different ways, resulting in Multiplicity of Behavior (MoB). Different strategies generalize differently and can be chosen in real-time for new tasks or environments, bypassing the need for time-consuming retraining. We release a fast, robust open-source MoB locomotion controller, Walk These Ways, that can execute diverse gaits with variable footswing, posture, and speed, unlocking diverse downstream tasks: crouching, hopping, high-speed running, stair traversal, bracing against shoves, rhythmic dance, and more. Video and code release: https://gmargo11.github.io/walk-these-ways/

translated by 谷歌翻译

In-context Examples Selection for Machine Translation

Sweta Agrawal , Chunting Zhou , Mike Lewis , Luke Zettlemoyer , Marjan Ghazvininejad

分类：自然语言处理

2022-12-05

Large-scale generative models show an impressive ability to perform a wide range of Natural Language Processing (NLP) tasks using in-context learning, where a few examples are used to describe a task to the model. For Machine Translation (MT), these examples are typically randomly sampled from the development dataset with a similar distribution as the evaluation set. However, it is unclear how the choice of these in-context examples and their ordering impacts the output translation quality. In this work, we aim to understand the properties of good in-context examples for MT in both in-domain and out-of-domain settings. We show that the translation quality and the domain of the in-context examples matter and that 1-shot noisy unrelated example can have a catastrophic impact on output quality. While concatenating multiple random examples reduces the effect of noise, a single good prompt optimized to maximize translation quality on the development dataset can elicit learned information from the pre-trained language model. Adding similar examples based on an n-gram overlap with the test source significantly and consistently improves the translation quality of the outputs, outperforming a strong kNN-MT baseline in 2 out of 4 out-of-domain datasets.

translated by 谷歌翻译

Neural Feature-Adaptation for Symbolic Predictions Using Pre-Training and Semantic Loss

Vedant Shah , Aditya Agrawal , Lovekesh Vig , Ashwin Srinivasan , Gautam Shroff , Tanmay Verlekar

分类：人工智能 | 机器学习

2022-11-29

We are interested in neurosymbolic systems consisting of a high-level symbolic layer for explainable prediction in terms of human-intelligible concepts; and a low-level neural layer for extracting symbols required to generate the symbolic explanation. Real data is often imperfect meaning that even if the symbolic theory remains unchanged, we may still need to address the problem of mapping raw data to high-level symbols, each time there is a change in the data acquisition environment or equipment. Manual (re-)annotation of the raw data each time this happens is laborious and expensive; and automated labelling methods are often imperfect, especially for complex problems. NEUROLOG proposed the use of a semantic loss function that allows an existing feature-based symbolic model to guide the extraction of feature-values from raw data, using `abduction'. However, the experiments demonstrating the use of semantic loss through abduction appear to rely heavily on a domain-specific pre-processing step that enables a prior delineation of feature locations in the raw data. We examine the use of semantic loss in domains where such pre-processing is not possible, or is not obvious. We show that without any prior information about the features, the NEUROLOG approach can continue to predict accurately even with substantially incorrect feature predictions. We show also that prior information about the features in the form of even imperfect pre-training can help correct this situation. These findings are replicated on the original problem considered by NEUROLOG, without the use of feature-delineation. This suggests that symbolic explanations constructed for data in a domain could be re-used in a related domain, by `feature-adaptation' of pre-trained neural extractors using the semantic loss function constrained by abductive feedback.

translated by 谷歌翻译

Is Conditional Generative Modeling all you need for Decision-Making?

Anurag Ajay , Yilun Du , Abhi Gupta , Joshua Tenenbaum , Tommi Jaakkola , Pulkit Agrawal

分类：机器学习 | 人工智能

2022-11-28

Recent improvements in conditional generative modeling have made it possible to generate high-quality images from language descriptions alone. We investigate whether these methods can directly address the problem of sequential decision-making. We view decision-making not through the lens of reinforcement learning (RL), but rather through conditional generative modeling. To our surprise, we find that our formulation leads to policies that can outperform existing offline RL approaches across standard benchmarks. By modeling a policy as a return-conditional diffusion model, we illustrate how we may circumvent the need for dynamic programming and subsequently eliminate many of the complexities that come with traditional offline RL. We further demonstrate the advantages of modeling policies as conditional diffusion models by considering two other conditioning variables: constraints and skills. Conditioning on a single constraint or skill during training leads to behaviors at test-time that can satisfy several constraints together or demonstrate a composition of skills. Our results illustrate that conditional generative modeling is a powerful tool for decision-making.

translated by 谷歌翻译

Indian Commercial Truck License Plate Detection and Recognition for Weighbridge Automation

Siddharth Agrawal , Keyur D. Joshi

分类：计算机视觉

2022-11-23

Detection and recognition of a licence plate is important when automating weighbridge services. While many large databases are available for Latin and Chinese alphanumeric license plates, data for Indian License Plates is inadequate. In particular, databases of Indian commercial truck license plates are inadequate, despite the fact that commercial vehicle license plate recognition plays a profound role in terms of logistics management and weighbridge automation. Moreover, models to recognise license plates are not effectively able to generalise to such data due to its challenging nature, and due to the abundant frequency of handwritten license plates, leading to the usage of diverse font styles. Thus, a database and effective models to recognise and detect such license plates are crucial. This paper provides a database on commercial truck license plates, and using state-of-the-art models in real-time object Detection: You Only Look Once Version 7, and SceneText Recognition: Permuted Autoregressive Sequence Models, our method outperforms the other cited references where the maximum accuracy obtained was less than 90%, while we have achieved 95.82% accuracy in our algorithm implementation on the presented challenging license plate dataset. Index Terms- Automatic License Plate Recognition, character recognition, license plate detection, vision transformer.

translated by 谷歌翻译

MARLIN: Masked Autoencoder for facial video Representation LearnINg

Zhixi Cai , Shreya Ghosh , Kalin Stefanov , Abhinav Dhall , Jianfei Cai , Hamid Rezatofighi , Reza Haffari , Munawar Hayat

分类：计算机视觉

2022-11-12

This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). Our proposed framework, named MARLIN, is a facial video masked autoencoder, that learns highly robust and generic facial embeddings from abundantly available non-annotated web crawled facial videos. As a challenging auxiliary task, MARLIN reconstructs the spatio-temporal details of the face from the densely masked facial regions which mainly include eyes, nose, mouth, lips, and skin to capture local and global aspects that in turn help in encoding generic and transferable features. Through a variety of experiments on diverse downstream tasks, we demonstrate MARLIN to be an excellent facial video encoder as well as feature extractor, that performs consistently well across a variety of downstream tasks including FAR (1.13% gain over supervised benchmark), FER (2.64% gain over unsupervised benchmark), DFD (1.86% gain over unsupervised benchmark), LS (29.36% gain for Frechet Inception Distance), and even in low data regime. Our codes and pre-trained models will be made public.

translated by 谷歌翻译